Reducing Rule Covers with Deterministic Error Bounds

نویسندگان

  • Vikram Pudi
  • Jayant R. Haritsa
چکیده

The output of boolean association rule mining algorithms is often too large for manual examination. For dense datasets, it is often impractical to even generate all frequent itemsets. The closed itemset approach handles this information overload by pruning “uninteresting” rules following the observation that most rules can be derived from other rules. In this paper, we propose a new framework, namely, the generalized closed (or -closed) itemset framework. By allowing for a small tolerance in the accuracy of itemset supports, we show that the number of such redundant rules is far more than what was previously estimated. Our scheme can be integrated into both levelwise algorithms (Apriori) and two-pass algorithms (ARMOR). We evaluate its performance by measuring the reduction in output size as well as in response time. Our experiments show that incorporating g-closed itemsets provides significant performance improvements on a variety of databases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Qualitative State Observer

The state estimation of a quantized system (Q.S.) is a challenging problem for designing feedback control and model-based fault diagnosis algorithms. The core of a Q.S. is a continuous variable system whose inputs and outputs are represented by their corresponding quantized values. This paper concerns with state estimation of a Q.S. by a qualitative observer. The presented observer in this pape...

متن کامل

Reduced-order performance of parallel and series-parallel identifiers with weakly observable parasitics

The stability properties of discrete-time parallel and series-parallel identifiers with respect to a specific model-plant order mismatch are analyzed. While in a deterministic environment with no modeling error the two schemes give identical results, when used in a deterministic environment with modeling error their performance is different. We assume a singularly perturbed state representation...

متن کامل

Adaptive integration for multi-factor portfolio credit loss models

We propose algorithms of adaptive integration for calculation of the tail probability in multi-factor credit portfolio loss models. We first devise the classical Genz-Malik rule, a deterministic multiple integration rule suitable for portfolio credit models with number of factors less than 8. Later on we arrive at the adaptive Monte Carlo integration, which simply replaces the deterministic int...

متن کامل

Stochastic and Deterministic Approaches to Estimation in H1

This paper examines system identiication methods from frequency response data that have recently emerged under the title of`Estimation in H 1 '. We brieey review this work and examine in detail the eeects of model order on linear algorithms. This leads to a model order selection criterion which has not been previously discussed in the literature. All the existing literature in this area examine...

متن کامل

Rectangle Size Bounds and Threshold Covers in Communication Complexity

We investigate the power of the most important lower bound technique in randomized communication complexity, which is based on an evaluation of the maximal size of approximately monochromatic rectangles, with respect to arbitrary distributions on the inputs. While it is known that the 0-error version of this bound is polynomially tight for deterministic communication, nothing in this direction ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003